Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: avoid unnecessary json marshal #626

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

crenshaw-dev
Copy link
Member

We have a utility function stripTypeInformation that json marshals and then unmarshals an unstructured in order to remove type information before diffing.

The only caller of stripTypeInformation immediately JSON-marshals the new unstructured, i.e. we JSON-marshal the unstructured object twice. Instead of having a separate utility function, we can take advantage of the remarshal function's marshaled copy of the unstructured to unmarshal and perform type-stripping.

I ran a 120s memory profile on an Intuit app controller. remarshal was responsible for 2% of the memory use. A 120s CPU profile showed remarshal as responsible for 3% of the CPU use. So I think the function is worth optimizing.

Testing with a medium-size manifest (about 100 lines) showed the following difference in benchmark:

before: Benchmark_remarshal-16              7328            155438 ns/op           90839 B/op       1338 allocs/op
after:  Benchmark_remarshal-16             10000            119162 ns/op           77498 B/op       1090 allocs/op

So 15% less memory and 23% less CPU time. The difference should be more dramatic with larger manifests.

I didn't commit the benchmark code, but here it is for reference:

func Benchmark_remarshal(b *testing.B) {
	manifest := []byte(`
apiVersion: v1
kind: Service
metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"argocd.argoproj.io/sync-options":"ServerSideApply=true"},"name":"multiple-protocol-port-svc","namespace":"default"},"spec":{"ports":[{"name":"rtmpk","port":1986,"protocol":"UDP","targetPort":1986},{"name":"rtmp","port":1935,"protocol":"TCP","targetPort":1935},{"name":"rtmpq","port":1935,"protocol":"UDP","targetPort":1935}]}}
  creationTimestamp: '2022-06-24T19:37:02Z'
  labels:
    app.kubernetes.io/instance: big-crd
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:argocd.argoproj.io/sync-options': {}
          'f:labels':
            'f:app.kubernetes.io/instance': {}
        'f:spec':
          'f:ports':
            'k:{"port":1935,"protocol":"TCP"}':
              .: {}
              'f:name': {}
              'f:port': {}
              'f:targetPort': {}
            'k:{"port":1986,"protocol":"UDP"}':
              .: {}
              'f:name': {}
              'f:port': {}
              'f:protocol': {}
              'f:targetPort': {}
            'k:{"port":443,"protocol":"TCP"}':
              .: {}
              'f:name': {}
              'f:port': {}
              'f:targetPort': {}
          'f:type': {}
      manager: argocd-controller
      operation: Apply
      time: '2022-06-30T16:28:09Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
        'f:spec':
          'f:internalTrafficPolicy': {}
          'f:ports':
            .: {}
            'k:{"port":1935,"protocol":"TCP"}':
              .: {}
              'f:name': {}
              'f:port': {}
              'f:protocol': {}
              'f:targetPort': {}
            'k:{"port":1986,"protocol":"UDP"}':
              .: {}
              'f:name': {}
              'f:port': {}
              'f:protocol': {}
              'f:targetPort': {}
          'f:sessionAffinity': {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2022-06-25T04:18:10Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:loadBalancer':
            'f:ingress': {}
      manager: kube-vpnkit-forwarder
      operation: Update
      subresource: status
      time: '2022-06-29T12:36:34Z'
  name: multiple-protocol-port-svc
  namespace: default
  resourceVersion: '2138591'
  uid: af42e800-bd33-4412-bc77-d204d298613d
spec:
  clusterIP: 10.111.193.74
  clusterIPs:
    - 10.111.193.74
  externalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - name: rtmpk
      nodePort: 31648
      port: 1986
      protocol: UDP
      targetPort: 1986
    - name: rtmp
      nodePort: 30018
      port: 1935
      protocol: TCP
      targetPort: 1935
    - name: https
      nodePort: 31975
      port: 443
      protocol: TCP
      targetPort: 443
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
`)
	un := unstructured.Unstructured{}
	err := yaml.Unmarshal(manifest, &un)
	require.NoError(b, err)
	opts := applyOptions(diffOptionsForTest())

	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		remarshal(&un, opts)
	}
}

Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment.

// Unmarshal again to strip type information (e.g. float64 vs. int) from the unstructured
// object. This is important for diffing since it will cause godiff to report a false difference.
var newUn unstructured.Unstructured
err = json.Unmarshal(data, &newUn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to provide the same results if compared with the original code. Previously we were stripping the types from obj and using that to marshal into data. In the new code, data will just hold the Marshal(obj) result which could lead to a different behaviour. To validate this, it would be better to write unit tests that validate the type stripping logic. Checking on Pod resources values will probably be an easy way to validate this:

    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Copy link
Member Author

@crenshaw-dev crenshaw-dev Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is only used only for another Unmarshal (or, more specifically, unmarshaling via json.NewDecoder, but the behavior should be the same).

So before we had:

unstructured -> marshal -> unmarshal to unstructured -> marshal -> unmarshal to schema type

Now it will be:

unstructured -> marshal -> unmarshal to schema type

i.e. we're skipping unmarshal to unstructured (which is just map[string]interface{}) and the second marshal.

My belief is that those to steps don't remove any type information that unstructured -> marshal doesn't already remove.

But I'll write some tests to give us a bit more confidence that we're not losing anything important.

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and covered by TestRemarshalResources.

@leoluz
Copy link
Contributor

leoluz commented Sep 16, 2024

@agaudreault, @crenshaw-dev The TestRemarshalResources seems to validate the type. However, it isn't clear to me what the outcome of data holding a different value could bring.

@crenshaw-dev crenshaw-dev force-pushed the avoid-unnecessary-marshal branch from 8c91416 to f416878 Compare September 17, 2024 11:36
@crenshaw-dev
Copy link
Member Author

@leoluz PTAL

The only way data is used is in unmarshaling to reflect.New(reflect.TypeOf(item).Elem()).Interface() (i.e. the schema type).

The "happy path" is already covered by TestRemarshalResources, i.e. we unmarshal into the schema type without an error. So the only uncovered behavior change is when there's some problem unmarshaling data into unmarshalledObj.

I've added tests to ensure the happy path still works when the input type isn't the json library's preferred type (i.e. float instead of float64).

I've also added a test to ensure that the unhappy path behaves the same (i.e. the input value can't be unmarshaled into the schema type).

I think this covers the relevant cases.

Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
@crenshaw-dev
Copy link
Member Author

Refactored the test to make SonarCloud happy.

Copy link

@rumstead
Copy link
Member

LGTM here are the tests I ran before and after.

func TestRemarshal2(t *testing.T) {
	type args struct {
		obj []byte
		o   options
	}
	tests := []struct {
		name string
		args args
		want map[string]interface{}
	}{
		{name: "PodFloatCpu", args: args{
			obj: []byte(`
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: nginx:1.7.9
    name: nginx
    resources:
      requests:
        cpu: 0.2
        memory: 25600Mi
`),
			o: options{log: textlogger.NewLogger(textlogger.NewConfig())},
		}, want: map[string]interface{}{
			"cpu":    "200m",
			"memory": "25Gi",
		}},
		{name: "PodMilliCoreCpu", args: args{
			obj: []byte(`
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: nginx:1.7.9
    name: nginx
    resources:
      requests:
        cpu: 5000m
        memory: 25600Mi
`),
			o: options{log: textlogger.NewLogger(textlogger.NewConfig())},
		}, want: map[string]interface{}{
			"cpu":    "5",
			"memory": "25Gi",
		}},
		{name: "PodMemoryBytes", args: args{
			obj: []byte(`
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: nginx:1.7.9
    name: nginx
    resources:
      requests:
        cpu: 5000m
        memory: 25600
`),
			o: options{log: textlogger.NewLogger(textlogger.NewConfig())},
		}, want: map[string]interface{}{
			"cpu":    "5",
			"memory": "25600",
		}},
		{name: "PodMemoryMi", args: args{
			obj: []byte(`
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: nginx:1.7.9
    name: nginx
    resources:
      requests:
        cpu: 5000m
        memory: 25600Mi
`),
			o: options{log: textlogger.NewLogger(textlogger.NewConfig())},
		}, want: map[string]interface{}{
			"cpu":    "5",
			"memory": "25Gi",
		}},
		{name: "PodMemoryGi", args: args{
			obj: []byte(`
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: nginx:1.7.9
    name: nginx
    resources:
      requests:
        cpu: 5000m
        memory: 25Gi
`),
			o: options{log: textlogger.NewLogger(textlogger.NewConfig())},
		}, want: map[string]interface{}{
			"cpu":    "5",
			"memory": "25Gi",
		}},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			mBefore := unstructured.Unstructured{}
			err := yaml.Unmarshal(tt.args.obj, &mBefore)
			if err != nil {
				assert.Fail(t, err.Error())
			}
			mAfter := remarshal(&mBefore, tt.args.o)
			requestsAfter := mAfter.Object["spec"].(map[string]interface{})["containers"].([]interface{})[0].(map[string]interface{})["resources"].(map[string]interface{})["requests"].(map[string]interface{})
			assert.Equalf(t, tt.want, requestsAfter, "remarshal(%v, %v)", tt.want, requestsAfter)
		})
	}
}

Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Tks for the added tests. 🤞🏻

@crenshaw-dev crenshaw-dev merged commit 72bcdda into argoproj:master Sep 17, 2024
3 checks passed
@crenshaw-dev crenshaw-dev deleted the avoid-unnecessary-marshal branch September 17, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants